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ABSTRACT 

We present here a fast code for creating a synthetic survey of the Milky Way. Given one or more color- 
magnitude bounds, a survey size and geometry, the code returns a catalog of stars in accordance with a given 
model of the Milky Way. The model can be specified by a set of density distributions or as an N-body real- 
ization. We provide fast and efficient algorithms for sampling both types of models. As compared to earlier 
sampling schemes which generate stars at specified locations along a line of sight, our scheme can generate a 
continuous and smooth distribution of stars over any given volume. The code is quite general and flexible and 
can accept input in the form of a star formation rate, age metallicity relation, age velocity dispersion relation 
and analytic density distribution functions. Theoretical isochrones are then used to generate a catalog of stars 
and support is available for a wide range of photometric bands. As a concrete example we implement the 
Besan9on Milky Way model for the disc. For the stellar halo we employ the simulated stellar halo N-body 
models of Bullock & Johnston (2005). In order to sample N-body models, we present a scheme that disperses 
the stars spawned by an N-body particle, in such a way that the phase space density of the spawned stars is con- 
sistent with that of the N-body particles. The code is ideally suited to generating synthetic data sets that mimic 
near future wide area surveys such as GAIA, LSST and HERMES. As an application we study the prospect 
of identifying structures in the stellar halo with a simulated GAIA survey. We plan to make the code publicly 
available at http : / /galaxia . sourcef orge . net, 

Subject headings: Galaxy: stellar content - structure- methods: data analysis - numerical 
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1. INTRODUCTION 

Generating a synthetic catalog of stars in accordance with a 
given model of galaxy formation has a number of uses. First, 
it helps to interpret the observational data. Secondly, it can 
be used to test the theories upon which the models are based. 
Moreover synthetic catalogs can be used to test the capabili- 
ties of different instruments, check for systematics and device 
strategies to reduce measurement errors. This is well under- 
stood by the architects of galaxy redshift surveys who rely 
heavily on ACDM simulati ons to remove ar tifacts imposed 
by the observing strategy ( Colless et al.ll200l|) . 

Given the widespread use of synthetic catalogs, a need for 
faster and accurate methods to generate stellar synthetic cat- 
alogs has recently arisen due to the advent of large scale sur- 
veys in astronomy, e.g., future surveys like LSST and GAIA 
have plans to measure over 1 billion stars. In order to gener- 
ate a synthetic catalog, one first needs to have a model of the 
Milky Way. While we are far from a dynamically consistent 
model, a working framework is fundamental to progress. In- 
evitably, this will require approximations or assumptions that 
may not be mutually consistent. Cosmologists already accept 
such compromises when they relate the observed galaxies to 
the dark-matter test particles that emerge from cosmological 
simulations. 

There have been various attempts over the past few decades 

^ Leverhulme Visiting Professor, and Merton College Fellow, University 
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to create a Galaxy model that is constr ained by observa- 
tions. Th e earl iest such attempt was by iBahcall & Soneiral 
(ll980allR 119841) where they assumed an exponential disc 
with magnitude dependent scale heights. An evolutionary 
model using population synthesis techniques was presented 
by R obin & Cre ze (1986). Given a star formation rate (SFR) 
and an initial mass function (IMF), one calculates the result- 
ing stellar populations using theoretical evolutionary tracks. 
Local o bservations were th en used to constrain the SFR and 
IMF. Bienayme et al.l (I1987D later introduced dynamical self 
consistency to constrain the disc scale height. The present 
state of the art is described in lRobin et aP (l2003h and is known 
as the Besan9on model. Here the disc is constructed from 
a set of isothermal populations that are assumed to be in 
equilibrium. Analytic functions for density distributions, the 
age/metallicity relation and the IMF are provided for each 
population. A simila r scheme is also u sed by the photometric 
code TRILEGAL ( Girardi et al.ll200 5h. 

In spite of its popularity, the current Besan9on model has 
important shortcomings. A web interface exists to generate 
synthetic catalogs from the model but it has limited applica- 
bility for generating wide area surveys. Discrete step sizes 
for radial, and angular coordinates need to be specified by the 
user and results might differ depending upon the chosen step 
size. The scale height and the velocity dispersion of the disc 
are in reality a function of age but, due to computational com- 
plexity, the disc is modeled as a finite set of isothermal discs 
of different ages. Increasing the number of discs enhances 
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the s moothness of the model but at the price of computational 
cost (iGirardi et al.l l2QQ5). 

In addition to the disc components, one also needs a model 
of the stellar halo. Under the hierarchical structure forma- 
tion paradigm, a significant fraction of the stellar halo is 
thought to have been produced by accretion events and signa- 
tures of these should be visible as substructures in the stellar 
halo. Missions like GAIA, LSST and PanSTARRS are be- 
ing planned that will enable us to detect substructures in the 
stellar halo. 

A smooth analytic stellar halo as in the Besan^on model is 
inadequate for testing schemes of substructure detection. Fur- 
thermore, such a halo does not accommodate known struc- 
tures like the Sagittarius dwarf stream w hich may consti- 
tute a large frac tion of the present halo (llbata et al.l 119951: 
IChou et al]|2QlQl) . Substructures have complex shapes and 
hence to model them we cannot use the approach of analytic 
density distributions as discussed earlie r However, N-body 
models are ideally suited for this task. iBrown et^ (l2005b 
attempted to combine a smooth galaxy model with some sim- 
ulated N-body models of disrupting satellites, but the stel- 
lar halo was not simulated in a proper cosmological context. 
Using hybrid N-body techniques. Bullock & Johnston ( 2005) 
have produced high resolution N body models of the stellar 
hal o that are siniulated w ithi n a cosmological c ontext; see 
also lCooper eTaH (I2010bh and lDe Lucia & Helmil (^008) for a 
similar approach. These can be used to make accurate predic- 
tions of the substructures in the stellar halo and also test the 
ACDM paradigm. However, as highlighted by iBrown et"an 
([2005) there are several unresolved issues related to sampling 
of an N-body model and this has prevented their widespread 
use. 

The aim of this paper is to present fast and accurate meth- 
ods to convert analytic and N-body models of a galaxy into 
a synthetic catalog of stars. This would relieve the burden of 
generating catalogs from modelers on one hand and on the 
other hand would allow the testing of models generated by 
different groups. This should also facilitate rapid testing of 
new models. 

We present a new scheme for sampling the analytical mod- 
els that enables us to generate continuous values of the vari- 
ables like position and age of stars. Instead of a set of discs 
at specified ages, our methodology allows us to generate a 
disc that is continuous in age. As a concrete example, we 
use the Besan^on analytical model for the disc. To model the 
disc kinematics more accurately, we employ the S hul (11969!) 
distribution function that describes the non-circular motion in 
the plane of the disc. F or the stellar halo w e use the simu- 
lated N-body models of iBuUock & JohnstonI (12005 ) that can 
reproduce the substructure in the halo. We show a scheme for 
sampling the N-body particles such that the sampled stars pre- 
serve the underlying phase space density of N-body particles. 



priori. This can be expressed in general as 

/i=/j(r,v,r,Z,m), (1) 

j being the label of the component. An accurate form of 
Equation ([T]) that describes all the properties of the Galaxy 
and is self consistent is still an open question. However, over 
the past few decades considerable progress has been made to 
arrive at a working model using a few simple assumptions 
(Robin & Creze 1986; Bienavme et al. 1987; Ha ywood et al.l 
1997a b; Girardi et al. 2005; Robin et al. 2003). Our analyt- 
ical framework is based upon these models and we describe 
this below. 

For a given galactic component, let the stars be formed 
at a rate of ^(r) and the mass distribution of stars ^(m,r) 
(IMF) be a parameterized function of age r only. Also, let the 
present day spatial distribution of stars, /pos(r, r) be a func- 
tion age only. Finally, assuming that the velocity distribution 
/vei(v, r, r) and the metallicity distribution, fz{Z^ r, r) are a 
function of age and position only one arrives at a model of the 
form (label j omitted for brevity) 



/i 



*(r) 

(m) 



^(^, ^)/pos(r, r)/vei(v, r, r)fz{Z, r, r) (2) 



Note, the functions on the right hand side can potentially 
take different forms for different galactic components, e.g., 
the thin disc, the thick disc and so on. The IMF here is 
normalized such that J^"!'"'' ^(^7 r)dm = 1 and (m) = 
jm^ax jyi^(^jji^(jijji is tj^e mean stellar mass. We now discuss 
the functional forms of the metallicity and the velocity distri- 
bution. 

The distribution fz is modeled as a log-normal distribution, 

= 1 ^-(logZ-logZ(r))/(2af,^^(r))^ 



c^iogzWv27r 

the mean and dispersion of which is given by an age- 
dependent function. The mean metallicity as function of 
age, Z{t), is popularly known as the age-metallicity relation 
(AMR). In general a spatial dependence might also be added 
to Equation ([3]), e.g., d\og{Z) / dR =constant. 

The velocity distribution /vei can be modeled as a triaxial 
Gaussian, 



/vel '- 



-exp 



24(r,i?) 2a2(r,i?) 



' (^0 - ^circ(^) - ^ad(T, K)f 



exp 



2<Tl(r,i?) 



(4) 



where, 0, z are the cylindrical coordinates, Vcirc(^) the cir- 
cular velocity as a function of cylindrical radius R and ^ad the 
asymmetric drift, which is given by the Stro mberg's relation 
(equation 4.228 in lBinnev & Tremaind[2QQ8h 



2. METHODS 

2.1. Analytic framework for modeling the galaxy 

We first describe the analytic framework that is used by us 
for modeling the Galaxy. The stellar content of the Galaxy is 
modeled as a set of distinct components, e.g., the thin disc, 
the thick disc, the stellar halo and the bulge. The distribu- 
tion function, i.e., the number density of stars as a function of 
position (r), velocity (v), age (r), metallicity (Z), and mass 
(m) of stars for each component is assumed to be specified a 



2v, 



dhiR d\nR 



Alternative ly, one can model the distribution of vs using th e 
IShul (Il969h distribution function (ISchonrich & Binnevl[2QQ9h . 
this is described in Appendix lAl The dispersions of the i?, z 
and (j) components of velocity increase as a function age due 
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to secular heating in the disc and moreover there is also radial 
dependence, the dispersi on is hig her close to the center. We 
model these effects as in iBinnevI jlOlQi) usinH the following 
functional form 



thi] 

^R4 



-^thi] 



(R) 



Vthin 

^0 



where H is the surface dens ity of the disc (age dependence 
from Aumer & Binnevll2Q09l) . 

The model as given by Equation ^ has some limitations. 
For example, Equation Q by itself is not dynamically self 
consistent and this has to be imposed externally. For a given 
^circ(^), ^('?") and crz{r) the vertical structure of the disc in 
the solar neighborhood can be made dynamically se lf consis- 
tent by using the app roach of iBienayme et al.l (11987 ) (see also 
I Just & JahreiBll20rO ). This leads to a constraint of the form 
e = e(r), e being the ellipticity of a homoeoid disc (or scale 
height hz = hz{r) for a double exponential disc). 

The second limitation of Equation Q is that it does not 
have an explicit dependence on the birth radii of a star. Re- 
cently semi-analytic models of the Galaxy that treat chem- 
ical evolution with radial mixing have been proposed by 
ISchonrich & Binneyl (|2009|) where the properties of a star are 
linked to their birth radii. A possible way to accommodate 
such models is to introduce in Equation © an additional pa- 
rameter Ri, the radius at which a star is born in cylindrical 
coordinates. The model distribution function is then given by 

/pos(r, r)/vei(v, r, Ri, r)fz{Z, Ri.r), (7) 

/pos (r, r) being the present day spatial distribution of 
stars that were born r years ago at radius Ri . 

2.2. Sampling an analytic model: adaptive von Neumann 
rejection technique 

Having specified the analytical framework we now discuss 
the scheme to sample stars from a model specified within this 
framework. In the present paper we restrict ourselves to mod- 
els of the form of Equation Q but the scheme we discuss is 
equally applicable to extensions of the form of Equation (|7]). 

For a model given by Equation ©, if r and r are given, 
sampling the metallicity distribution and the velocity distri- 
bution is trivial. Similarly, sampling the mass of a star is also 
straightforward as it is a one-dimensional distribution. Once 
the age, metallicity and mass of a star are known, a synthetic 
library of isochrones can then be used to generate stellar pa- 
rameters like color, magnitude, gravity and so on. So the key 
task is to sample the position and age coordinates from a spec- 
ified analytical function 



^(r)/pos(r,r) 
(m) 



(8) 



which is multidimensional. 

A simple approach to sample points from an analytical mul- 
tidimensional distribution as given by Equation ([8]) is to use 
the von Neumann rejection technique — generate a star hav- 
ing random coordinates r, r; then generate a random num- 
ber X between and ^max and finally accept the star if x < 
^(r, r). This naive approach is computationally very expen- 
sive. A lot of computational effort is spent to sample the low 



density regions that require rejection of a lot of stars. More- 
over, the computational time does not scale with output sam- 
ple size, so to generate even a small specific sample of stars, 
e.g., stars lying within a given color magnitude limits or stars 
lying in some specific region of space, one has to generate all 
the stars in the galaxy. 

We now discuss an efficient technique to sample a spec- 
ified multidimensional distribution. We first divide the en- 
tire domain into bins that extend perpendicular to the time 
axis. Then we subdivide these bins into smaller sub-domains, 
called leaf nodes, with a spatial oct-tree (each sub node having 
l/8th the volume of its parent node). Having subdivided the 
system, the von Neumann rejection technique is now applied 
individually to each of the nodes. This has two immediate 
advantages: 

• Depending upon the given geometry of the survey, one 
can check if the boundaries of the node intersect with 
that of the survey and, if not, one can skip the genera- 
tion of stars. 

• For accepting and rejecting stars it is now possible to set 
the maximum density ^max(r, r, 1), 1 being a vector rep- 
resenting the length of the sides of the node, adaptively 
for each node. Since the variation in stellar density in 
any node will be limited, g / ^max will never be very 
small, with the consequence that the rejection sampling 
method will not be inefficient. 

We still need to decide on an optimum truncation criterion 
to stop the splitting of nodes. An ideal truncation criterion is 
to have the least density variation within a node but this would 
result in a lot of nodes with negligible number of stars but an 
otherwise strong number density gradient. Instead, we adopt 
a criterion that strives to achieve a fixed number of stars in 
a node similar to the one used in adaptive mesh refinement 
schemes. For a total of A^tot stars, we set the resolution limit 
of a node as A/'tot/lO^, a choice that effectively generates 
about 10^ nodes. However, calculating the number of stars 
in a node involves an integral of the number density over the 
volume of the node, which can be computationally intensive 
when done for a large number of nodes. Hence, for the node 
splitting purpose, we compute the number of stars as 



N' . 

node 



^max(r,r, \)lil2hh 



(9) 



where ^max(r, 1) is an analytic density function represent- 
ing the maximum density within a node and h^h^h^ h are 
the length of the sides of the node. The final number of stars 
in a node, A/'node, is calculated by numerically integrating 
the number density over the node with increasing spatial and 
temporal resolution till the number converges with an uncer- 
tainty less than O.lV^node- The above convergence criterion 
roughly corresponds to 1/ 10th of the expected Poisson noise. 

To summarize, our scheme for generating a synthetic cata- 
log of stars using the adaptive von Neumann sampling algo- 
rithm is as follows. For each node, we first determine A/'"°^®, 
the number of stars that need to be generated. Next, to as- 
sign position and age to stars, we randomly select r and r and 
then accept or reject them using the von Neumann scheme. 
Next, the mass m for a star is sampled from the IMF ^(m) 
and Z from fz{Z^ r, r). The nearest isochrone corresponding 
to (Z, r) is selected and the stellar parameters corresponding 
to m are determined. The velocities are assigned according 
to the distribution function /vei(v, r, r), which is generally in 
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the form of an ellipsoid or can be subjected to more advanced 
treatment, an issue that we revisit later. Finally, we use a se- 
lection function to decide if a star enters the final catalog of 
the survey. The process of spawning stars is repeated for all 
nodes to generate the full catalog. 

During the star spawning process we do not take into ac- 
count the effect of photometric errors or extinction. This is 
because each observational survey will have different photo- 
metric error laws, and it is not possible to formulate a general 
scheme for this. Also, there is no perfect 3d model for ex- 
tinction. Although, we do provide a default extinction model 
(see Section [2^ , but a user might be interested in trying out 
other models too. However, photometric errors and extinc- 
tion can be easily handled as a post processing step and this 
is the approach that we follow. To accomplish this, first, one 
has to generate a catalog with color magnitude ranges slightly 
larger than the desired ranges and then for each star add the 
expected extinction and subsequently the photometric errors. 
Finally, using these transformed magnitudes one can select 
the stars lying within the desired color magnitude limits. 

2.3. Speeding up the star spawning process 

The process of spawning stars from a node, as discussed in 
the previous section, can be further optimized and we discuss 
this below. To being with, the distribution of stellar masses is 
such that there are a lot of low mass stars but very few high 
mass stars. Low mass stars are also low in luminosity hence 
for a magnitude limited survey only nearby low-mass stars 
are visible. On the other hand, the volume explored increases 
as the cube of the distance. Hence for most of the nodes the 
low mass stars do not enter the final catalog. We use this 
information to optimize our scheme — for each node we first 
determine the lowest stellar mass mJJ^n^ that can generate a 
visible star and then exclusively generate stars having mass 
above this limit. The number of stars from a node that fall 
into a survey is then given by 

J ^node J node 

= iV"°de^(>mrji'') (10) 

We use stochastic rounding to convert A^^g^® to an integer 
— i.e., if the fractional part of A^^g^® is less than a Poisson- 
distributed random number with a range between and 1 , we 
increment the integral part by 1 . 

We now describe the procedure to calculate mJJ^n^- If 
r is the heliocentric distance of a node, having sides of 
length /node, then calculating mjj^^jj® requires computing the 
minimum mass of a star that will be visible at a distance 
r — V^lnode for cach isochrone and then computing the global 
minimum, i.e., over isochrones of all ages and metallicities. 
Doing this individually for each node might be computa- 
tionally expensive. However, it is possible to optimize this. 
First, for a given galactic component, we identify the set of 
isochrones that have a non-zero probability of being sampled 
and compute mJJ^n^ over these set of isochrones only. Sec- 
ondly, we note that for a given set of isochrones m^^"^^ is 
a monotonically increasing function of heliocentric distance. 
Since, we can choose any value of m^^^ that is smaller than 
the smallest visible mass, we access the nodes in a sequence 
sorted by their minimum heliocentric distance, r — V^lnode^ 
and recompute m^^^ whenever the distance changes by a 



fixed step size (a step size of 0.1 mag in distance modulus). 

2.4. Sampling an N-body model 
An N-body model consists of a finite set of particles dis- 
tributed in phase space (r,v). Let /^'a^se%ace(^' ^) phase 
space distribution function describing this. An N-body par- 
ticle in general represents a collection of stars rather than 
a single star. Let m^'^^^^ be the mass of the stellar pop- 
ulation corresponding to the N-body particle and this rep- 
resents the mass of stars that were ever formed (summed 
over all ages). Also let us assume the age, metallicity and 
mass distribution to be given by ^(r), /^(Z, r), and ^(m) 
respectively. The distributions being normalized such that 
J '^{r)fz{Z^r)^{m)dZdrdm = 1. The distribution func- 
tion of stars is then given by 

f * = /"N-body / ^ 
J phase-space V-^ ' ^ J 

<i{T)fziZ,T)am)m''-''°'y/{m) (11) 

where the simulations provide the spatial and kinematic 
behavior represented by the phase space density term 

.N-body / X 
J phase-space V ' / ' 

For an N-body model, the fact that the number of N-body 
particles is finite leads to the following problem — if the num- 
ber density of visible stars at a given point in space is more 
than the number density of N-body particles at the same point 
one needs to oversample the particles. The condition for this 
is given by 

/ drdZ I /*^^^>/prse%ace(r,v), 

or in other words 

^N-body 

e(>^min(r))— — >1, (12) 

(m) 

^min(^) being the minimum mass of a star that is visible at 
a heliocentric distance of r. Typically such a situation arises 
when r is small, i.e., in regions close to the sun. Hence unless 
one simulates the N-body model with the number of particles 
equal to the number of stars in the model galaxy, one has to 
live with oversampling. The question now is how are we to 
assign positions and velocities to the oversampled particles? 

The positions of oversampled particles should be assigned 
in such a way that they preserve the underlying phase space 
density. For this we first review a popular scheme of density 
estimation known as the kernel density estimation, which is 
also statistically quite robust. In this scheme the number den- 
sity at a given point x for an N-body system of particles is 
given by 

where K{u) is a kernel function such that its integral over 
all space is unity, rj is the distance of the j-th particle from 
X, and n the dimensionality of the space. In a multivariate 
setting, rj is given by 

r, = y/(x-Xj)TSri(x-Xj) (14) 

where Y^T'^ is the matrix representing the distance metric and 
is normalized such that det(I]j) = 1. The smoothing length 
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TABLEJ 

Geometry of stellar components. The formulas used are from IRobin et al.I (120031) . Note, (R, 0, z) are the coordinates in the 



GALACTOCENTRIC CYLINDRICAL COORDINATE SYSTEM AND 



(FOR THE THIN DISC). 



Component Age (Gyr) density law p(r, r) 



Thin Disc 



Thin Disc 



Thick disc 



< 0.15 



0.15-10 



^g^{exp(-(a//^H+)2) - exp(-(a//iH- )2)} 
where: Hr^ = 5000 pc, Hr- = 3000 pc 

IMF- ^(m) oc m~i-^ for m < 1 Mq and ^(m) oc m~^-° for m > 1 M(r 



{exp(-(0.52 + ^)) - exp(-(0.52 + ^))} 



where: Hr^ = 2530 pc, Hr- = 1320 pc, 

IMF- ^{m) oc m~^-^ for m < 1 Mq and ^(m) oc m~^-^ for m > 1 Mq 
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Spheroid 



Bulge 



14 



10 



if|2;|<Xi, pc(^(r - 11) exp( j—^ 

if\z\>xi, pc(^(r - 11) exp( ^ 

where: /z/^ = 2500 pc, hz = 800 pc, xi 
IMF-^(m) oc 



c-/ A\ / Max(ac ,a) \ ^ 

M(r-14)(^ ^ 

where: a"^ = R'^ + 

ac = 500 pc, e = 0.64, nn 



^)x(l. 



1/hz 



xix(2. + xi/hz 



•-)x 



exp(xi/hz) 



) X^^) 



400 pc 



-2.77 



IMF- ^(m) oc m' 



-0.5 



if ^/x 



> Rc, 



PcS(r - 
PcS{t - 



10) exp(-0.5r2) 



if ^/x'^ -\-y' 

where: = ,/[( — )2 + (^) 

i^c = 2.54, xo = 1.59, yo = zq 
IMF- C(m) oc m-2-35 



10)exp(-0.5rJ) x exp(-0.5( 



212 I c^^4 

0.424, a = 78.9°, /3: 



: 3.5°, 7 = 91.3° 



ISM 



Dark halo 



P^^M-^-h^) X exp(-j^) 
where: /i^ = 4500 pc, /i^ = 140 pc 



Pc 



(l. + (a/i?c)2) 

where: Rc = 2697 pc and pc 



-- 0.1079 



of a particle is determined as the distance to the k-th nearest 
neighbor from the location of the particle. The physical inter- 
pretation of the density scheme is that if each point is assumed 
to be a hyper-ellipsoidal ball with mass distributed according 
to the kernel function K and having a covariance of /i^S, then 
the underlying smooth density field is simply the result of su- 
perposition of such balls. This suggests that if the spawned 
stars are also distributed around the particle in a similar fash- 
ion, i.e., distributed according to the kernel function K and 
having a covariance is /i^S, then they will essentially be sam- 
pling the density field of the N-body system. 

The question that remains is how to determine the optimum 
covariance matrix S? The simplest scheme is to use a global 
metric based on the variance of the data along each dimen- 
sion. A global metric, however, is insufficient for accurately 
calculating the phase space densities. What is needed is a 
locally adaptive metric that changes with the local configura- 
tion of the data at each point in space (iSharma & Steinmet3 
I2Q06I: ISharma & Johnstonl[2QQ9h . Such a metric can be cal- 
culated using the publicly available multidimensional d ensity 
estimation code EnBiD by ISharma & Steinmetzl (I2QQ6I) . En- 
BiD, which is an improve ment over a scheme proposed by 
lAscasibar & Binney ( 2005), makes use of a binary space par- 
titioning scheme along with the use of the information theo- 
retic concept of Shannon entropy to handle the issue of de- 
termining the optimal locally adaptive distance metric S"^ 
in a multi-dimensional space. For our use, we adopt the En- 



BiD scheme with the number of nearest neighbors k set to 64, 
which we find is adequate for reproducing a smooth distribu- 
tion of stars in the phase space. Too small a value leads to 
lumpiness around the N-body particles and too large a value 
erodes the positional accuracy of the data. We assume the ma- 
trix S to be diagonal and use the cubic cell scheme of EnBiD, 
i.e Sii = 1122 = ^33 and 1144 = = See (dimensions 1,2 
and 3 representing position coordinates while the dimensions 
4,5 and 6 the velocity coordinates). The EnBiD algorithm was 
applied individually to each of the disrupted satellites in the 
simulations. 

In order to assess the usefulness of EnBiD for our partic- 
ular application, we plot in Figure [T] (lower right panel) the 
distribution of the velocity scale factor \/T^ii/T^44 , which 
is the ratio of position to velocity scale and has units of 
kpc/ kms~^, as estimated by EnBiD for the N-body parti- 
cles from one of the simul ations of a disrupted satellites by 
iBuUock & JohnstonI (|2005|) (see Section l34l) . It can be seen 
that the scale factor varies by about 3 orders of magnitude, 
which demonstrates the inappropriateness of a global metric 
and the need for a locally adaptive metric scheme. The huge 
variation is due to the fact that during phase mixing the phase 
space density of particles is approximately conserved. This 
means that high density regions in position space will corre- 
spond to low density regions in velocity space and vice versa. 
This can be easily seen in the panels of Figure [T] where scat- 
ter plots of particles in position and velocity space has been 
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Fig. 1 . — Distribution of the velocity scale factor in units of kpc/ km 
for one of the disrupted satellites from BJ05 simulations and its dependence 
on location in phase space. The bottom left panel shows the distribution of 
velocity scale factor while the other panels show the scatter plot of particles 
in position and velocity space. The particles lying in the inner (r < 40) and 
outer regions (r > 80) of the halo are colored red and green respectively (r 
being the radial distance from the center of the stellar halo). High density 
regions in position space are low density regions in velocity space and vice 
versa. 

shown. The high density regions in position space (r < 40) 
are colored red while the low density regions in position space 
(r > 80) are colored green. The mean value of scale fac- 
tor (V^S^iT^) was found to be 0.057 and 2.9 for the red 
and green regions (see lower right panel for distributions). As 
a check we computed the scale factor using variance of the 

particles 



/ (cr2 + cr2 + cr2)/(cr2^ + + cTvz)- The values 

obtained were 0.037 and 2.3 for red and green particles re- 
spectively, which are in agreement with results reported by 
EnBiD. 

Summarizing, the scheme for sampling N-body distribu- 
tions is as follows. Select a particle i, determine if the dis- 
persing volume, defined by the kernel function K and the 
smoothing length hi, intersects with the survey volume, and 
calculate the number of visible stars it spawns, i.e., Nl^^ = 
^N-body ^j^^ mmin{r)) / {m) . We use stochastic rounding as 
discussed earlier to convert A^^-g to an integer. Next, for each 
spawned star we generate r, Z and m according to the dis- 
tribution functions ^t-, fz{Z^ r) and i{m) and disperse them 
in phase space using the scheme discussed above. Finally, 
we use a selection function to decide if a star enters the final 
catalog of the survey. 

2.4.1. Improving the positional accuracy while sampling an N-body 

model 

Dispersing the stars spawned by an N-body particle in 
phase space has its side effects. This adds noise to the data, 
the effect of which is to reduce the positional accuracy. When 
a particle is oversampled one cannot avoid the noise but when 
a particle is under-sampled or equally- sampled i.e, less than 





Fig. 2. — Position and velocity scatter plots of a disrupted satellite sampled 
from an N-body simulation of Bullock & Johnston ( 2005). The upper panels 
show the original N body data while the rest of them are for sampled data. 
The sampling fraction /sample is labeled on the panels. The panels in second 
row use an improved scheme that disperses the spawned stars in phase space 
only when more than one star is spawned by an N-body particle. Note, the 
stellar mass associated with an N-body particle is not necessarily constant. 



or equal to one star needs to be spawned from the N-body par- 
ticle, one can eliminate the noise by simply assigning the co- 
ordinates of the particle directly to the spawned star. But it is 
not possible to know a priori as to how many stars from a par- 
ticle will enter final catalog. To overcome this we implement a 
scheme which makes sure that, among the stars spawned by a 
particle if one of the stars can be reassigned the coordinates of 
its parent particle without it being excluded from the final cat- 
alog we do so. The effect of this improved sampling scheme 
is shown in Figure [2] which shows the position (left panels) 
and velocity (right panels) scatter plot of a disrupted satel- 
lite system from an N-body simulation of lBuUock & Johnstonl 
( 2005). The panels in the top row show the N-body parti- 
cles in the simulation, while the other rows show the stars 
sampled from it. The sampling fraction /sample, i-^., average 
number of spawned stars per N-body particle, is also labeled 
on each of the panels. For illustrative purpose we assume 
the number of stars spawned per unit mass of the particle to 
be constant but in a realistic simulation it will depend upon 
the heliocentric distance of the particle. First, we compare 
the first row with the second row that shows the results of 
the improved scheme for the case of /sample = 1- The dis- 
tribution of position coordinates is nearly identical for both 
rows which is what we expect when /sample = 1. On closer 
scrutiny, in some regions a loss of positional accuracy can be 



seen. Also some regions apparently seem to be undersampled. 
This departure from the ideal behavior is because the stellar 
mass associated with an N-body particle obtained from the 
simulations that we use, is not necessarily constant. So some 
N-body particles can spawn more than one star which subse- 
quently needs to be dispersed in phase space. Next, we com- 
pare the results of the improved scheme (second row) with the 
naive scheme (third row) that disperses all stars for the same 
value of /sample = 1. It Can be clearly seen that the improved 
scheme has better positional accuracy. For reference, the last 
row shows the case of /samp le = 4. We revisit this issue in 
Section \5J\ and Section [531 and discuss in greater detail the 
consequences of our scattering scheme for real applications. 

2.5. Stellar I sochrones 

Theoretical stellar isochrones are one of the main ingre- 
dients of our scheme. These are used to assign parameters 
like luminosity, effective temperature, magnitude and color 
to a star of a given age, metallici t v and mass. We employ 
Padoya isochrones dMarigo et alj I2QQ8 ; iMarigo & Girardil 
I2OO7I: iGirardi et al.l |^00; Bert eUi et al.1 [T9941) . which are 
available for a wide variety of photometric systems and un- 
like other popular isochrones also cover the asymptotic giant 
branch and the red clump phases of stellar evolution. 

We now describe the scheme for interpolating across differ- 
ent isochrones. An isochrone, for a given age and metallic- 
ity, is a table of stellar parameters, e.g., magnitudes, effec- 
tive temperature and gravity, as a function of stellar mass. 
Hence in a given isochrone, stellar parameters for any in- 
termediate mass can be easily computed by linear interpo- 
lation. However, interpolating across isochrones of differ- 
ent ages and metallicities involves the use of equivalent evo- 
lutionary points, which are not always provided with the 
isochrones. Hence, for the sake of flexibility as well as com- 
putational ease we adopt a simpler scheme. To begin with, 
a grid of 182 x 34 isochrones spanning an age interval of 
6.6 < logt/yr < 10.22 (step size of Alog(t) = 0.02 ) and 
metalHcity interval of 10~^ < Z < 3 x 10~^ (mean step 
size of A log(Z) ^ 0.072) were chosen. Next, for a star of a 
given age and metallicity, the nearest isochrone in the grid was 
identified. Subsequently, this isochrone was used to compute 
the stellar parameters by linear interpolation in mass. Where 
needed, the metallicity Z was converted to [Fe/H] using the 
relation 

[Fe/H] = log(Z/Z0); where Zq = 0.019 (15) 

In the above mentioned scheme, to simulate the color mag- 
nitude diagrams accurately, the adopted grid should have a 
fine enough resolution; and our adopted resolution was found 
to be adequate for this purpose. However, for greater accuracy 
if desired one can increase the resolution of the grid. 

The Padova isochrones are limited to stellar masses 
greater than 0.15 Mq. For lower masses extending up 
to the hydrogen mass burning limit (0.07 Mc^ < m < 
0.15 Mq) we use the isochrones from lChabrier et al] (I2Q0QI) . 
Since, these isochrones are only available for solar metal- 
licity, we us e d them for all metallicities. Also, the 
IChabrier et al.l (I2QQQI) isochrones are only available for 
Johnson-Cousin bands, hence, in order to able to sup- 
port a variety of photometric bands, we choose to com- 
pute absolute magnitudes for the above mentioned low 



mass stars using the bolometric correction (BC) table s from 
|http : / /stev . oapd. inaf . it /dustyAGB07/| which 
is a database of BC tables for different photomet ric bands pro- 
vided by th e Padova group (Marigo et al. 2008; Girardi et"an 
'2002', f^OO?). 

Note, in the present version of the code we do not model 
the white dwarfs. White dwarfs are quite faint (My > 10) 
and rare compared to other type of stars and hence for most 
applications they do not dominate the star counts. However, 
in applications where one expects to find white dwarfs caution 
should be exercised when interpreting the results of Galaxia. 

2.6. Extinction 

Most stellar observations suffer from extinction by dust and 
this needs to be taken into account when comparing simu- 
lated catalogs with stellar surveys. Although the total extinc- 
tion along a given li ne of sight has been accurately mapped 
(|Schlegel et al.lll998[). the same cannot be said for the 3D dis- 
tribution of dust. TPrimmel et aP (l2QQ3h provide a sophisti- 
cated model for the 3D dust distribution, and other alternatives 
include Marshall et al. ( 2006). Since a comprehensive extinc- 
tion model is beyond the scope of this paper, we provide here 
only a simple extinction correction scheme. Note, extinction 
is basically a post processing step and once an extinction-free 
catalog has been generated using Gahxia, one can subject it 
to the extinction model of one's choice. 

The extinction scheme we use is a refinement of the method 
proposed by Bland-Hawthorn et al. ( 2010 b). The 3D distribu- 
tion of the dust is modeled as a double exponential disc. 



■ exp 



R — Rq 



X exp 



^warp I 



(16) 



where 2;warp(^, and /cflare(^) describe the warp and flare 



of the disc and are described in Section [3]2l The values of the 
parameters controlling the warp and flare were assumed to be 
same a s that of the stellar disc. Next, we use the lSchlegel et al.l 
(|1998|) E{B — V) dust maps to evaluate the best fit parameters 
of the dust model and obtained Jir = 4200 pc, hz = ^^pc and 
po = 0.54mag/ kpc. The fit parameters were obtained by 
minimizing 



X 



E 



pmodel 



(17) 



^ The isochrones were downloaded 

|http : / / stev .'oa pd . inaf . it /cgi-bin/cmd 



from 



where Ef^^dei ^ j^c 

PDust(^i, ^i, '^)dr is the extinction in the 
cell i defined in the (/, b) space. The fit was performed in the 
range —15° < b < 15° assuming Rq = 8.0 which defines 
the distance scale. 

The modeled dust disc is then used to generate a 3D extinc- 
tion map in galactic coordinates b and r, by integrating the 
dust density along different lines of sight. The angular reso- 
lution of the 3D maps was assumed to be 1024 x 1024 while 
in the radial directions 512 logarithmically spaced bins in the 
range 0.01 < r < 30 kpc were used. Values for any arbitrary 
position were obtained from the maps by linear interpolation 
in (r, / , 6). Alternately, one can also assume a directional de- 
pendent normalization constant po such that the total extinc- 
tio n at infinity along a gi ven line of sight equals the value in 
the'Schlegel et al." ("1998') maps. For this we use high resolu- 
tion Schlegel et al. (1998) maps having an angular resolution 
of 4096 X 4096 in / and b. We employ this as our default 
scheme. 
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TABLE 2 

Age and metallicity distribution (mean and dispersion) of 
galactic comp onents. the values shown are from 
IRobin et alI ( l2003l) 





Age (Gyr) 


[Fe/H] 


C^[Fe/Hl 


d[Fe/H]/dR 


Thin disc 


0-0.15 


-0.01 


0.12 






0.15-1 


-0.03 


0.12 






1-2 


-0.03 


0.10 






2-3 


-0.01 


0.11 


-0.07 




3-5 


-0.07 


0.18 






5-7 


-0.14 


0.17 






7-10 


-0.37 


0.20 




Thick disc 


11 


-0.78 


0.30 





Stellar Halo 


14 


-1.78 


0.50 





Bulge 


10 


0.00 


0.40 






Note, Schlegel extinction maps are known to overesti 
mate the extinction in the Galactic plane (regions where 
Ay > 0.5 mag), by about 30% (Arce & G oodmanlll99 ' 
ICambresv et aH i20Q5h . While |Arce & Goodmanl Sl99 
report res ults in a st rip of few degrees in length only, 
IC a mbre sy et alJ (|2QQ5|) report mean values in galactic anti- 
center hemisphere only. Since a proper recalibrated map is 



not yet available it is difficult to take these effects into ac- 
count. In addition to directly affecting the total value of ex- 
tinction along a particular line of sight, the overestimation is 
also expected to affect the determination of our scale height 
for the dust disc, which in turn will affect the variation of ex- 
tinction with distance. If we assume the overestimation factor 
to be a monontonic function of extinction we expect the ac- 
tual scale height of the dust disc to be larger than what we 
find here. Hence, we advise caution to be exercised when us- 
ing our extinction scheme in the galactic plane. 

3. A WORKING MODEL OF THE GALACTIC COMPONENTS 

The main ingredients of our galactic model as described by 
Equation (O are the star formation rate (SFR), the age velocity 
relation (AVR), the IMF and the density profiles. We now dis- 
cuss the adopted functional forms for each of these functions 
so as to arrive at a working model of the Galaxy. Instead of 
doing a detailed modeling we here implement the well known 
and well tested Besan9on model. The density distributions 
and the ages and metallicity of various galactic components 
are given i n Table ffl and Tabl e O these are the same as the 
one used by lRobin et al.l (|2QQ3|) (R03 hereafter). The thin disc 
spans an age interval of — 10 Gyr. On the other hand, the 
thick disc, the bulge and the halo are all assumed to have a 
constant age. 

3.1. Thin and thick disc 

In the Besan9on model, the axis ratio e and velocity disper- 
sions (Ji^, (70 and az, for the thin disc, are tabula ted as a func- 
tion of age r (velocity ellipsoid being taken from lGomez et al.l 
fl997). Here we parameterize them as functions. The axis ra- 
tio e is parameterized as 



e(r) = Min 0.0791,0.104 



' max ' mm 



0.5> 



(18) 



and for velocity dispersions we use Equation (O. To match 
the Besan9on values, the velocity dispersion for the thin 
disc is assumed to saturate at 0.862(7^^0, 00,20 and a value of 
Tmax = 10.0 and Tmin = 0.1 is uscd. The values of param- 
eters used are summarized in Table [3] The adopted values 



TABLE 3 

Velocity ellipsoid of stellar components. Note, (R, cf), z) are 

THE COORDINATES IN THE GALACTOCENTRIC CYLINDRICAL 
COORDINATE SYSTEM. 











q 


/3 




kms/s 


kms/s 


kms/s 






Thin disc 


50 


32.3 


21 


0.33 


0.33 


Thick disc 


67 


51 


42 


0.33 


0.33 


Spheroid 


141 


75 


75 








Bulge 


110 


110 


100 









for the thin and thick disc reproduce the Besangon results. 
Note, the radial dependence of velocity dispersions is mod- 
eled here using Equation ([6| (values of q and {3 being from 
ISchonrich & Binneyl 12009), rather than the Besan^on model 
that assumes dlnaj^/dR = —0.2 kpc~^ and zero derivative 
for other components. Note, the circular velocity profile is 
computed from the Besan^on mass model and at the location 
of the Sun it has a value of 226.84 km s~^. 

3.2. Warp and flare 

The thin and thick discs are assumed to have a warp and a 
flare that is modeled following the prescription of R03. As- 
suming galactocentric cylindrical coordinates (i?, ^, z), stars 
with radius R > Rwarp are displaced perpendicular to the 
plane by an amount 



^warp 



{R, (j)) = 7warpMin(i?warp, R - ^warp)cos(0 - 0max) 

(19) 

where (/)max is the direction in which the warp is maximum. 
For flaring, stars with R > R^are have their scale heights 
increased by a factor 



k{[are{R) = 1 + 7flareMin(i?flare, R " ^flare) 



(20) 



For the parameters we adopt the same values as that used by 

R03, namely (/)max 90.0, 7warp 0.18 kpc"\ i^flare 

I.URq and7flare = 0.0054 kpc-^ 
3.3. Bulge 

It has been well established that the Milky Way hosts a bar- 
shaped bulge (Blitz 1993). With the arrival of the COBE in- 
frared maps, it has been possible to construct 3d models to 
fit the data. Various models usin g different tr iaxial analytic 
functions were presented by Dwek et aH (119951) . of which the 
G2 model (boxy triaxial Gaussian type functions) provided 
the best fit to the data. The G2 bar is used by the Besangon 
model and is adopted by us in our initial demonstration of the 
Galaxia model. The G2 density distribution has a core at the 
center defined by radius Rc and the distribution is elongated 
along one axis which defines the major axis. The orientation 
of the major axis is defined by three angles a, /3 and 7. The 
values for these parameters are taken from R03, where they 
obtained the values by fitting the model to the near infrared 
star counts of the DENIS survey. 

Next, we describe the kinematic properties of our 
model. Self-consistent dynamical models have been pre- 
sented that use either the Schwarzschild technique Zhaol 
(1996) or are extracted fro m N-body simulations (Fux 1999; 
lAthanassoula & MisiriotisI |2002|) . but these are beyond the 
scope of the current paper. Instead, we attempt to create a 
simple working model that roughly satisfies the existing ob- 
servational data and would be useful for studying the system- 
atic effects that cloud the interpretation of observation data. 
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A number of earlier studies have claimed that the bulge is in 
solid body rotation by fitting a straight line to the plot of mean 
radial velocity as a function of longitude (also referred to as a 
rotatio n curve) , e.g ., by Me nziesI (ll99Q|) using Mira variables 
and by'Izumiu ra et al.i (^1995) using SiO maser stars. A slope 
of about 10 kms~^deg~^ has generally been reported, which 
translates to a pattern speed of ^7 = 71.62 kms~^ kpc~^, as- 
suming 8 kpc as the distance to the galactic center. Recently, 
iHoward et al.l (l2QQ9h have also measured the rotation curve 
using bulge M-giants in two strips at 6 = —4 and b = —8. 
They report the two rotation curves to be nearly identical, sug- 
gesting that the bulge is rotating cylindrically. A straight line 
fit (by eye) to their rotation curve also seems to suggest a value 
close to 10 kms~^deg~^. Hence, for simplicity we adopt the 
value of 1^ = 7 1.62 kms~^ kpc~^ f or the pattern speed of 
the bulge. Note, iHoward et"an (l2008h also report that the ro- 
tation curve for the b = —4 strip, shows evidence of flattening 
beyond |/| > 4. 

Having specified the rotation we next review the velocity 
dispersions. Specifically, we attempt to fix the values for the 
velocity dispersions cri^,cr0 and of our model, expressed 
in cylindrical coordinates with respect to the galactic center. 
A value of cr^ = 110.0 kms~^ has typically been reported 
for the line of sight velocity dispersion in Baad e's window 
(iTerndrup et al.lll995l: iBinnev & Merrifieldlll998|) and this is 
what we ass ume for aR. Recent results by IHoward et al.l 
(|2009L |2008|) also confirm this but show a variation with both 
latitude and longitude- at 6 = — 4, is found to increase 
from 70 kms~^ to 110 kms~^ with |/| varying from 10° to 
0°, while at 6 = — 8, is found to be relatively constant with 
a value of 70 kms~^. 

Using proper motions, velocity dispersions along / and b 
have also been mea sured and an anisotropy of a J ah = 1-15 
has been reported (Rattenburv et al.ll2007l: Ivieira et al.l 120071: 
^paenhauer et al. 1992). The anisotropy could both be due 
to intrinsic anisotropy as well as due t o the rotational broad- 
ening as demonstrated by I Zhao et al.l (Il99 6>). Since it is not 
possible to exactly deduce the value of from ai we assume 
for simplicity = aR and choose the value of so as to 
reproduce the observed value of anisotropy ratio in Baades's 
window. For a choice of = 100 kms~^, the bulge stars 
lying in the field (/, 6) = (1°, —4°) and with heliocentric dis- 
tance between 7 — 9 kpc, were found to have cri/ab = 1.17. 



3.4. Simulated stellar halo 

To make theoretical predictions of structures in the 
stellar halo, we use the eleven stellar halo models of 
iBullock & JohnstonI (|2QQ5|) . which were simulated within the 
context of the ACDM cosmological paradigm. These simu- 
lations follow the accretion of individual satellites modeled 
as A^-body particle systems onto a galaxy whose disc, bulge, 
and halo potential is represented by time dependent analyt- 
ical functions. Semi-analytical prescriptions are used to as- 
sign a star formation history to each satellite and a leaky ac- 
creting box, chemical enrichment model is used to calculate 
the metallicity as a func tion of ag e for the stellar populations 
(^Robertson et al. 2005; Font et al. 2006). The dark matter ha- 
los are assumed to follow an NFW density profile whereas, 
the stellar matter is assumed to follow a Kings profile embed- 
ded within the dark matter halos. To simulate Kings profile 
within dark matter profiles the dark matter particles are as- 
signed a stellar mass which is a function of the energy of the 



dark matter particles 0. The stellar masses depend upon the 
adopted star formation time scale and the later is chosen by 
requiring that the simulated satellite stellar distributions re- 
produce the structural properties of Local Group dwarf galax- 
ies. The three main model parameters of an accreting satellite 
are the time since accretion, tacc, its luminosity, Lgat, and the 
circularity of its orbit, defined as e = J/ Jdrc (J being the 
angular momentum of the orbit and Jdrc the angular momen- 
tum of a circular orbit having same energy). The distribution 
of these three parameters describes the accretion history of a 
halo. To study the sensitivity of the properties of structures 
in the stellar halo to accretion history, additionally a set of 
six artificial stellar halo models (referred to a s non-ACDM 
halos) were generated by I Johnston et al.l (l2008h . These have 
accretion events that are predominantly (i) radial (e < 0.2), 
(ii) circular (e > 0.7), (iii) old (tacc > H Gyr), (iv) young 
(tacc < 8 Gyr), (y) high-luminosity (L^at > lO^L©), and (vi) 
low-luminosity (L sat < 10^ Lq). 

3.5. Analytic smooth stellar halo 

For some applications it is also necessary to have a smooth 
stellar halo. We do this by assuming an oblate power law 
distribution with ellipticity e = 0.76 and power law index of 
nn = —2.44 as suggested by R03. The val ues of e and nn can 
be altered if desired, e.g., recent results by lJuric et al.l (l2QQ8h 
using SDSS predict a flatter halo (e = 0.64) and steeper value 
for the power l aw index (nu = —2.77). In accordance with 
the results of Bo nd et aO ([2()1Q[) . the principal axes of the ve- 
locity ellipsoid are assumed to be aligned with a spherical co- 
ordinate system and the velocity dispersions are assumed to be 
(Jr = 141 kms~^ and ag* = = 75 kms~^. The spheroid 
is assumed to have no net rotation. 

3.6. Initial mass function and normalizations 

Having specified the density functions, we now provide 
the the initial mass function ^ and the normalization con- 
stants like pc and the star formation rate ^(r). As in the 
Besangon model, the IMF was assumed to be of a power 
law form, ^(m) = m~", the values of a are tabulated in 
Table \T\ The mass limits of the IMF were defined to be in 
the range 0.07 < m < 100. Next, the density normal- 
ization constants were chosen to reproduce the local mass 
density of current stars po for the various galactic compo- 
nents in the solar neighborhood (Table 2 in R03). This re- 
sulted in a star formation rate of ^ = 2.85 Mq/jt and pc 
of 1.55 X 10^ and 1.31 x 10^ Mokpc"^ for the thick disc 
and the stellar halo respectively. However, for the thin disc in 
the regime 7 < r < 10 Gyr the SFR had to be lowered by 
20% in order to match the star counts in the Besangon model. 
This is because in this regime the isochrones employed by 
us differ from those in the Besangon model. For the bulge 
Pc = 3.49xlO^M0kpc-^ was selected which gives a central 
density of 13.76 stars pc~^ in accordance with the Besangon 
model. 

For our analysis, we assume the Sun to be located at 
a radial dis tance of 8 kpc from th e center of the galaxy 
( Eisenhauer et al.1 120031: |Reidl 119931) Since our adopted ra- 
dial distance of Sun is different from the Besangon model 
(8.5 kpc), we need to recompute the SFR. The re-calibrated 
value of SFR for our adopted location of the Sun turns out to 

^ We actually use the massless test particles which ha ve 10 times the phase 
space resolution of the original dark matter particle (see IBullock & JohnstonI 
( 2005) for further details). 
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Fig. 3. — Star count predictions of Galaxia in various directions compared 
with those of the Besangon model. For the direction along the North pole the 
contributions for different galactic components are shown separately. 

be 2.37 Mq/jt. Also, to match the star counts for the stel- 
lar halo at large distances (along the north galactic pole) with 
those of the Besangon model, we had to increase the local 
mass density of the stellar halo by 10%. 

4. TESTS AND RESULTS 

In this section we test the code by comparing its predictions 
with various observational constraints. 

4.1. Comparison with Be sangon 

The star count predictions of the Besangon model have been 
tested against a variety of observations, and since Galaxia uses 
the same disc model, it suffices for most situations to show 
that Galaxia reproduces the Besangon model. In order to keep 
the test simple dust extinction was neglected. To test the cor- 
respondence of our results with that of Besangon, we plot in 
FigureOthe star counts as a function V band magnitude along 
three directions; the north galactic pole, 6) = (90°, 10.0°) 
and (/, b) = (270°, 10.0°). The last two directions were cho- 
sen to illustrate the effect of warping which results in a bifur- 
cation in the star counts at fainter magnitudes. Overall we find 
good agreement with the curves from Galaxia corresponding 
closely to the Besan9on curves. 

For the direction along the galactic pole, the star counts for 
each of the galactic components are also shown separately. 
It can be seen that the thick disc and the stellar halo are in 
perfect agreement but the thin disc shows some subtle differ- 
ences. For the Galaxia thin disc, there is a slight excess of in- 
termediate magnitude stars and a shortage of extremely faint 
magnitude stars. This same effect can be seen in the other two 
directions also but is shifted to even fainter magnitudes. These 
discrepancies are due to the differences in the isochrones em- 
ployed by us as compared to those used by R03. Specifically 
for the thin disc in the mass range 0.15 < m < 0.6, the ab- 
solute magnitude of stars predicted by the Padova isochrones 
were found to be brighter than that of the isochrones used by 
the Besangon model. 

4.2. Comparison with the Hipparcos data 

The predictions of stellar properties in the solar neighbor- 
hood is an essential test for any galactic model. For these 



-5 


_ Hipparcos V<8.00, 8353 stars 


_ Galaxia V<8.00, 1 1 1 69 stars 











^^^^ 




5 






10 







-1.0 -0.5 0.0 0.5 1.0 1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 

B-V B-V 

Fig. 4. — Comparison of color-magnitude diagram obtained from the Hip- 
parcos catalog (left panel) with that of simulation from Galaxia (right panel). 
The plots show stars in the solar neighborhood defined by V < 8 and dis- 
tance r < 100 pc. 
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Fig. 5. — Comparison of distributions of radial distance, absolute mag- 
nitude and color for stars obtained from the Hipparcos catalog with that of 
simulation from Galaxia. 

Stars, accurate distances can be obtained, which enables the 
construction of the color - absolute rn agnitude diagram of stars. 
The Hipparcos mission (lESAlll997h produced an astrometric 
data base of 117955 stars down to a limiting magnitude of 
V ~ 12.4. For stars with V < 9, the data have a median 
precision of about 1 mas which is ideal for analyzing the solar 
neighborhood. The completeness limit of the Hipparcos cat- 
alog is estimated to be between 7.3-9 mag in V band. Hence 
we use the following criterion to e xtract a volume com plete 
sample from the Hipparcos catalog (Ivan Leeuwenll2Q07l new 
reduction) — V < 8 and parallax tt > 10 mas. Binary stars 
were excluded from the analysis. The Hipparcos magnitude. 
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Hp, was converted to V band magnitude using the relation 
Hp = V^ 0.408(5 -V)- 0.13(5 - Vf from lTuron et"an 
Cl992). Next, we use Galaxia to generate a similar catalog. 
The parallax errors for Hipparcos, which a re a functi on of ap- 
parent magnitude, were simulated as in Girar di et al.l (|2005), 
where such an analysis was earlier performed. Although, 
iGirardi et"^ ([2005) had not taken extinction into account we 
here do take it into account. For this study w e adopt the 
extinction model presented in (lAumer & Binneyl 12009) such 
that, 

E{B -V) = Max(0, 0A7{d - 0.07 kpc)), (21) 

which takes into account the fact that the extinction within 70 
pc from the Sun is negligible. The reddening was converted 
to extinction assuming an Ry of 3.1. Note, the extinction is 
quite small, and hence it is not expected to have any signifi- 
cant impact on the results. 

Figure H] shows the color magnitude distribution of stars 
within 100 pc from the Sun using the Hipparcos and the sim- 
ulated data sets. The plots are very similar. In Figure [5] 
the probability distributions of absolute magnitude, color and 
distance are also shown. All of the distributions show good 
agreement with the Hipparcos data. Some minor differences 
do exist, in particular: (i) In the radial distribution of stars 
there is a slight excess of stars at r = 45 pc and t his is due 
to the Hyades cl uster as pointe d out by IGirardi et al. ( 2005). 
However, unlike IGirardi et^I] ((2005) our overall radial dis- 
tribution is in good agreement with Hipparcos and does not 
show a deficit at low r and excess at high r as reported by 
them, (ii) The absolute magnitude distribution of Galaxia 
shows a mild excess at My ^ 1 and a sli ght deficit at My 
3. This excess was also observed by Girardi et al. (^05) 
and is of comparable magnitude. Again, unlike Girardi et aL 
(|2005|) we do not see any deficit in the faint tail (My > 4). (iii) 
In the color distribution, the red clump peak at B — V l 
is shifted slightly to the red side for Galaxia an d the blue tail 
shows a slight excess of stars. As remarked by Girardi et al." 
(^2005) these discrepancies could be due to imperfections in 
the stellar evolution models or due to imperfect simulation of 
the parallax errors and should be investigated in future. 

Other than the minor differences discussed above, there is 
one major difference between the two catalogs. The number 
of stars in the simulated catalog is about 33% higher. This is 
due to the fact that we have excluded the stellar multiplicity 
(e.g. unresolved binary systems) from the Hipparcos catalog. 
For the selection limits used by us the Hipparcos catalog con- 
tains about 1549 sources that are flagged as binary. Multiply- 
ing this number by two and adding it to the number of single 
sources (8353) we get a total of about 11451 stars, which is 
in good agreement with the number of stars in the simulated 
catalog (11169). 

4.3. Comparison with solar neighborhood kinematics from 
the Geneva-Copenhagen survey 

The best kinematic data to date are largely restricted to the 
solar neighborhood. We now compare these observations with 
kinematic simulations from Galaxia. We use the data fr om the 
Geneva-Copenhagen Survey, GCS, (Nordstrom e t^ 120041: 
iHolmberg et al.ll2009|) . which is a selection of 16682 F and G 
type main sequence stars, out of which velocities and temper- 
atures are available for 13382 stars. The GCS catalog is com- 
plete only till r ~ 40pc in volume and V ~ 8 in magnitude 
and within these limits there are only about 1000 stars. Hence, 
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Fig. 6. — Distribution of GCS stars in the (Teff , My) plane and the choice 
of selection functions. The top panel shows the GCS stars lying within r < 
0.12 kpc and the lower panel shows the stars sampled with Galaxia. The 
shaded region in the lower panel shows the selection region used to mimic 
the GCS stars. The line represents the equation My = 115 — 30 log(Teff ). 

to increase our sample size we select stars within 120 pc from 
the Sun and with V < 9.6, 10893 stars were found to sat- 
isfy this criteria. A drawback of this is that beyond the com- 
pleteness limit of the survey, it is not possible to deduce the 
exact selection functions and these are needed to mimic the 
GCS catalog with Galaxia. However, the incompleteness was 
found to have little effect on the kinematic properties of the 
sample. To check this we compared the velocity distributions 
of the complete and incomplete samples as described above 
and found them to be nearly identical. Hence, an approxi- 
mate selection function that can reproduce the main proper- 
ties of the GCS sample, namely, the selection of F and G type 
dwarfs with minimal contamination from red giants, should 
be sufficient for our purpose. To do this, we plot in Figure [6] 
the distribution of our selected GCS stars, in the (Teff, My) 
plane. The shaded region in the plot shows the desired se- 
lection function in the (Teff,My) plane, which can be de- 
scribed by the following equations 3.65 < log(Teff) < 3.84 
and My < 115.0 — 301og(Teff). We use this to reproduce 
the GCS catalog with Galaxia and this is shown in the lower 
panel. The stars generated with Galaxia were also restricted 
to V < 9.6 in magnitude and r < 120pc in distance. 

Next, in Figure [71 we plot the predicted distributions of the 
U, V and W component of velocities and compare them to the 
GCS survey. It can be seen that the model (the red curves) 
provides a reasonable fit to the distributions of all the three 
components of velocities. In the plots shown in the panels, the 
Sun's motion with respect to the local standard of rest (LSR) 
was assumed to be Ug) 11.1 Vc:^ 12.24 and Wq 7.25 
as given by Scho nrich et al.l (120101) . 

In Galaxia we model the distribution of the compo- 
nent of velocities using the asymmetric drift relation given 
by Equation ([5]). As mentioned earlier, an alternative way to 
model the velocities is to use the Shu distribution func- 
tion which provides a more accurate treatment of the radial 
motion in the disc (see Appendix lAl for further details). In 
Figure [7] this is shown as the dashed curve. The Shu distri- 
bution function (DF) is found to overestimate the negative tail 
slightly and its peak is also slightly to the right as compared to 
th e one predicted by Equation Q. Using the iterative scheme 
of iDeh nen (1999) one can further improve the accuracy of 
the solution (dark green solid line in Figure [7]), but even this 
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Fig. 7. — Predicted velocity distributions near the Sun. Shown are the 
U, V and W component of velocities. The dotted lines show the distribu- 
tions of F and G type stars in the GCS survey of the solar neighborhood 
dNordstrom etanf2 004) (r < 0.12 kpc). 

has only a marginal effect in resolving the overestimation of 
the negative tail. On the other hand, lowering aro for both 
the thin and thick disc was found to reduce this discrepancy 
but this also makes the U distribution sharper than that ob- 
served for GCS stars. It should be noted that in the regime 
V < —60 km where the tail is overestimated, Poisson er- 
rors are also high, owing to low number of stars, hence we 
refrain from trying to alter the model to fit the data at this 
stage. Note, the fact that the Stromberg relation provides a 
better fit to the negative tail than the Shu DF, is only because 
it underestimates the negative tail as compared to the distri- 
bution expected when radial motions are properly taken into 
account. 

Finally, in a manner similar to lSchonrich et al.l (I201QI) . our 
model distributions can also be used to calculate the solar 
motion with respect to the LSR. To accomplish this we ap- 
plied a minimization scheme and fitted our model /7, W 
distributions to the distributions of the GCS stars (a uniform 
weighting scheme was used). We found (Uq^Vq^Wq) = 
(11.25±0.3, 10.93±1.2, 7.86±Q.3)k ms-^ (la range) , which 
are in good agreement with values of Schonrich et al.l (|2010). 
However, our predicted value of Vq is closer to value reported 
by Binney (2010) (11 kms~^) than that of Schonrich et al. 
(|2010|) . Note, this result is with the V component of velocity 
being modeled using the Shu distribution function, if instead 
the relation given by Equation ([5]) is used, the best fit value 
of ^0 is found to be even lower (~ 10 kms~^). However, 
this discrepancy is not significant considering the fact that a 
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Fig. 8. — Vertical distribution of star counts using Galaxia and its compar- 
ison with published results. 

random uncertainty of about 1 kms~^ is associated with it. 
Moreover, the substructures being quite dominant in the dis- 
tribution a sys tematic uncertainty of as high as 2 kms~^ is 
also expected (|Schonrich et al.ll2QTQ|) . 

4.4. Vertical distribution of stars in the solar neighborhood 

Predictions for vertical distribution of stars is another im- 
portant test for a model. Traditionally the vertical distribu- 
tion is modeled as a sum of two exponentials representing 
the thin and the thick disc. The best fit parameters as pro- 
vided by Juric et al. (2008) for the M dwarfs in SDSS, af- 
ter correction for biases, e.g., stellar multiplicity, are a thin 
disc with a scale length of 300 pc, a thick disc with a scale 
length of 900 pc, and a local thin to thick disc normaliza- 
tion of / = 13 %. Using K dwarfs towards the south galac- 
tic pole (SGP), iKuijken & Gilmord ([l989) report a value of 
/ = 4.27% and scale lengths of 249 pc and 1000 pc for 
the thin and thic k discs, respectively. A lso using the data to- 
wards the SGP, Gilmore & ReidI (|1983 ) provide the vertical 
distribution of stars for different absolute magnitude ranges 
— we use results reported for magnitude range 4 < My < 5 
since stars at higher absolute magnitudes could be contami- 
nated by giants. Figure [8] shows these published results along- 
side our results using Galax ia. Also i s shown the results 
from the model of Just & JahreilBl (120101) . Note, we normal- 
ize all the profiles to the density at z = 0.4 kpc. This is 
because density estimates close to 2; = cannot be reliably 
obtained and one has to use interpolation to get the local nor- 
malizations, which makes the estimates strongly dependent 
upon the assumed shape of the profile Q. Although using the 
SDSS selection functions would have been ideal for compar- 
ing with the predictions of Galaxia, the theoretical isochrones 

^ Interestingly, a compilation from literature of local normalization versus 
scale height by Arnadottir et al. ( 2009) seems to suggest that, in spite of a 
large spread in the values of these quantities, the estimated total mass between 
different surveys is roughly the same. 
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V vs proper motion /j^i towards {l,b) = 

Upper panel shows results obtained with 
Galaxia while the lower panel shows results from the Besangon model. 



are not very accurate in the M dwarf regime at the present 
time. Hence, we use a color range of 0.8 < B — V < 1.0 
and My > 4.5 to identify the main sequence dwarfs and 
report the res ults for them. Our resu lts correspond closest 
to the plot of iGilmore&ReidI (Il983h . Closer to the plane 
(z < 0.5 kpc) th ey are also in good agreement with that of 
lJuric et all (|2008|) . but at larger distances they differ. We also 
show the results with stars selected according to their absolute 
magnitude (4.0 < My < 5.0). This is quite close to the color 
limited profile but differs slightly in the regime z > 1. This 
shows the effect of selection bias on the profile. Note, our as- 
sumed location of the Sun above the galactic plane (15 pc) is 
slightly less than that reported by Juric et al. (2008) (25 pc), 
but this has negligible impact on the analysis presented here. 



We have implemented the scheme in the form of a serial 
code written in C++. It uses about 500 MB RAM and the 
amount of memory used is independent of the size of the cat- 
alog being simulated. However, the CPU time depends sensi- 
tively upon the parameters of the survey being simulated. In 
what follows, we report CPU times by running the code on a 
2.44 GHz single Intel processor. 

Recall that in our scheme the Galaxy is divided into roughly 
equal mass cubical boxes (nodes), and stars are generated 
from only those nodes that intersect with the survey volume. 
This causes the run time to vary nearly linearly with the mass 
of the galaxy being sampled by the survey. Now, for a given 
survey there are two main parts that consume most of the CPU 
time: a) time to pre-process a node, and b) time to generate 
stars in a node. Pre-processing all of the nodes takes about 
250 seconds. For surveys sampling a smaller mass fraction, 
this is also proportionately lower. Generating stars is where 
the code spends most time and to measure this we define the 
speed of the code as the number of stars generated per sec- 
ond. The code can reach a maximum speed of 0.2 million 
stars per second when generating all stars in the galaxy. For 
magnitude limited surveys, the speed is slightly lower, e.g., a 
V < 20 survey runs at 0.16 million stars per second while 
a V < 15 survey runs at 0.05 million stars per second. The 
optimization of the code to generate only stars that can en- 
ter the catalog is imperfect, especially if a galactic component 
has a large spread in metallicity. This is something we hope 
to improve in future. 

With the speeds given above, an all-sky GAIA-style survey 
with V < 20, consisting of 4 billion stars can be generated in 
about 6.5 hours. In comparison, the scheme of IBrown et al.l 
(|2005|) required about 2 weeks to generate a GAIA like sam- 
ple of 3.5 x 10^ stars while running on a cluster of processors. 
On the other hand, slV < 20 survey confined to 10,000 sq de- 
gree towards the north galactic pole, consisting of a sample of 
3x10^ stars can be generated in 220 seconds. 

Note, running a single instance of the code is sufficient for 
generating catalogs of sizes up to 10^ stars in less than half 
an hour, but for larger catalogs one could easily parallelize 
the code. Parallelization simply involves running multiple in- 
stances of the code, each with a different random seed and 
generating a fixed fraction of the full galaxy. 



4.5. Color vs proper motion comparison 

In R03, distributions of color vs proper motion from 
Besangon model were show n and found to b e in good agree- 
ment with observations of Ojha et al] (|1996|) . Here we repeat 
the same analysis and compare our results with those of the 
Besan9on model (Figure O. Stars were identified in a 15.5 
square degree area in the direction of (/,6) = (3,47) and 
with magnitudes in the range 12<V<18. In upper panel 
of Figure [9] the bluer region is dominated by the thick disc 
and the halo, while the redder region is dominated by the thin 
disc. It can be seen that our distribution is in good agree- 
ment with that of the Besangon model. However, subtle dif- 
ferences can be identified in the region 1 < B — V < 1.5. 
At around B — V 1.0, the dispersion of jii is slightly less 
for Besangon. Also, the Besan9on stars extend further red- 
wards. These differences owe their origin to the differences 
between the isochrones used by us and the Besan9on model 
(see Section |4J1) . 



4.6. Computational performance 



5. APPLICATIONS 

5.1. All- sky distributions of giants and red clumps 

As an application of the code, we investigate the distri- 
bution of giants and red clump stars over the sky. We use 
the following criteria to identify the stars that are giants and 
red clumps; Teff <5000K and log(^) < 3.3. A subset 
of these stars having 4500K< Teff <5000K and 1.625 < 
log(L/ L0) < 1.825 were identified as red clumps. FigurefTOl 
shows all-sky maps for the fraction of stars that are giants and 
red clumps, and the ratio of red clump to giants, for a RAVE- 
style survey with magnitudes in the range 9 < / < 13. The 
lower panels show the same but with dust extinction included. 
Except for the effect of the warp the plots being roughly sym- 
metrical about / = 180°, we concentrate on the first half only, 
i.e., 0° < / < 180°. It can be seen in Figure fTOl that the frac- 
tion of stars that are giants and red clumps decreases as a func- 
tion of / and |6| as one moves away from the galactic center 
and the galactic mid plane respectively. This is mainly due to 
the fact that giants being more luminous can be seen farthest 
in a magnitude limited sample, and for a given solid angle. 
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Fig. 10. — Angular distribution of giants and red clumps. Shown are the 
fraction of giants and red clumps, and the ratio of red clump to giants in a 
survey with magnitudes in the range 9 < / < 13. The upper panels are 
without dust extinction while the lower panels are with dust extinction. In the 
figures, the small scale structures lying away from the galactic mid plane are 
due to Poisson noise resulting from low number of stars. 



the volume sampled increases as cube of the distance. Hence, 
in directions where the disc extends the farthest, we should 
expect to see a higher fraction of giants. Red clumps on the 
other hand are less luminous than the giants which means that 
the ratio of red clumps to giants will be lower in regions where 
the fraction of giants is higher. Additionally, the red clumps 
are found in metal rich populations only. Now, due to the ex- 
istence of a vertical metallicity gradient in the disc, as one 
moves away from the galactic plane the fraction of metal rich 
population being surveyed decreases, this makes the ratio of 
RCs/giants to fall off as one moves away from the galactic 
mid-plane. Including dust extinction in general leads to faint 
stars getting excluded in a volume limited survey. Since a 
RAVE- style survey has a significant fraction of giants lying at 
larger distances, where they also appear faint, the effect of ex- 
tinction is to lower the overall fraction of giants (from 0.6 to 
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Fig. 11. — Angular distribution of giants and red clumps. Shown are the 
fraction of giants and red clumps, and the ratio of red clump to giants in 
a survey with SDSS r band magnitude in the range 4 < r < 22 . The 
upper panels are without dust extinction while the lower panels are with dust 
extinction. The color bar range for 1st and 3rd panels is to 0.1 and for 
2nd and 4th panels is to 1. In the figures, the small scale structures lying 
away from the galactic mid plane are due to Poisson noise resulting from low 
number of stars. 

0.4). As expected the overall ratio of red clump stars to giants 
increases slightly (from 0.3 to 0.4) on including extinction. 

Next, we repeat our analysis for an SDSS-style survey, with 
r band magnitude in the range 4 < r < 22 (Figure [TT]). The 
mean fraction of stars that are giants and red clumps is found 
to be 0.016, which increases to 0.034 when extinction is in- 
cluded. The mean ratio of red clumps to giants is 0.44, which 
with extinction included decreases only slightly to 0.42. Here 
the effect of extinction on the fraction of giants is opposite to 
what we saw earlier for the RAVE simulation. This is due to 
the fact that in an SDSS type survey the faint end of the mag- 
nitude distribution is dominated by main sequence dwarfs, 
which get excluded by the effect of extinction. On the other 
hand, given the extremely faint magnitude limit of the survey, 
the giants and red clumps in the disc are luminous enough to 
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Fig. 12. — Distribution in E — Lz space of particles lying within 150 kpc 
from the galactic center in one of the BJ05 simulations. 17 unique colors 
were used with each color representing a different satellite. Left panel shows 
the 20 most luminous satellite systems while the right panel shows the next 
40 most luminous systems. For each panel the range in luminosity in units of 
log(L/L0) and its percentage contribution to the total halo is also labelled. 
Each satellite system is sampled with about 1000 particles and the sampling 
probability of a particle with in a given satellite is proportional to its luminos- 
ity. The satellites were sorted in decreasing order of their luminosity before 
plotting, so as to make the lower luminosity satellite systems, which are ex- 
pected to cover a smaller area in the map, lie on top of higher luminosity 
systems. 

be visible even after including the effects of extinction. 

5.2. Identification of substructures in the stellar halo in 
E — Lz space with GAIA 

As an application of our code, we assess the ability of the 
astrometric mission G AIA to identify structures in the stel- 
lar halo. Helmi & de Zeeuwl (12000) first performed such an 
analysis and concluded that given the accuracy of GAIA, it 
would be relatively easy to identify the structures in integrals 
of motion space, e.g., energy E vs the z component of angu- 
lar momentum L^. Subsequently. iBrown e t al.'('2005) showed 
that in the £^ — space favored by Helmi & de Zeeuwl 
12000), the structures are nearly washed out if the back- 
ground population of sta rs are taken into account. While 
iHelmi & de Zeeuwl (|2000|) assume d the halo to be form ed out 
of in-falling satellites exclusively, iBrown et"an (|2005|) super- 
imposed a few disrupted satellites onto an otherwise smooth 
stellar halo. Hence neither of the simulations were done in a 
proper cosmological context. Recently Gomez et al.l (|2010|) 
(GHBL h ereafter) have done higher resolution simulations 
similar to iHelmi & de Zeeuwl (|2000l) along with some en- 
hancements, in particular, a time dependent analytic potential, 
and satellite luminosities drawn from the present-day distri- 
bution of Milky Way satellites ( Koposov et al. 2008). They 
conclude that significant amount of structure can be seen in 
the {E^Lz^L) space. We r epeat a similar analy s is her e but 
using the N-body models of iBuUock & JohnstonI (12005 ) that 
simulate the halo in a cosmological context and hence have 
realistic time-dependence of rate of in-fall and luminosities of 
accreted satellites. 

First we investigate as to how clustered are the particles of 
a tidally disrupted satellite in E — Lz space and compare them 
with the results of GHBL. Specifically we reproduce Figure 4 
of GHBL showing the distribution in E — Lz space of particles 
in the simulations with different colors representing different 
satellites. Results from one of the BJ05 simulations used by 
us are shown in Figure [121 While GHBL show all 43 satel- 
lites simulated by them, we show 20 most luminous satellites 
in left panel and next 40 luminous in right panel. The 20 most 
luminous systems which constitute about 92% of the stellar 
mass are significantly less clustered than the other low lumi- 



nosity systems. The Fig-4 in GHBL in fact resembles the 
right panel, which means that energy and angular momentum 
is conserved less in BJ05 than in GHBL. Also a lot particles 
can be seen at low Lz and high E in BJ05 as compared to 
GHBL suggesting that there are more high energy radial or- 
bits in BJ()5 simulations. 

Given the differences in the results of GHBL and BJ05, it is 
imperative to look at the differences between the methodology 
of the two simulations, so as to isolate the cause. 

• Dynamical friction is ignored in GHBL whereas BJ05 
use a modified version of the Chandrashekhar dy- 
namical friction formula. Since dynamical friction is 
strongest for massive systems this will make luminous 
systems appear fuzzier, an effect seen in BJ05 simula- 
tions. This could be one of the reasons for the BJ05 
results being fuzzier in general than GHBL. 

• In GHBL the stellar particles of a satellite are not ex- 
plicitly embedded in dark matter potentials. In gen- 
eral one expects embedded satellites to take longer to 
disrupt and moreover they disrupt in parts with each 
pericentric passage redistributing the energy and hence 
leading to more non conservation of energy. 

• In BJ05 the satellites are evolved from the time since 
they were accreted to the host halo whereas in GHBL 
all satellites are evolved for 10 Gyrs. As a consequence 
in BJ05 one expects older systems to be fuzzier and 
young systems to be clustered. 

• In GHBL the orbit initial conditions are drawn from a 
distribution function corresponding to the density pro- 
file of the stellar halo at z=0 whereas in BJ05 they are 
motivated from cosmological simulations. The overall 
effect of this on the results is not clear. 

Next, we investigate the prospect of detecting the structures 
in the stellar halo with GAIA. The parallax and photometric 
errors of GAIA increase steeply beyond V ^ lb, hence GAIA 
will be most accurate for nearby and/or bright stars. Hence 
similar to GHBL we generate a solar neighborhood sample 
of stellar halo stars, using BJ05 simulations, with the follow- 
ing constraints- V < 16, My < 4.5 and r < 4 kpc. This 
resulted in a sample of 1.3 x 10^ stars, which is similar but 
slightly larger than that of GHBL (0.8 x 10^). In Figure [H] 
we show a density map of stars in the £^ — space. In the 
left panel the stars have same phase space coordinate as that 
of their parent N-body particle, whereas in the right panel the 
spawned stars are distributed in p hase space according to the 
scheme mentioned in Section [24l It can be seen that dispers- 
ing the spawned stars in phase space has a disastrous effect 
of smoothing out the distribution of stars. In the top panels, 
we show the distribution of stars a single satellite system. It 
can be seen that even in the original N-body coordinates (top 
left panel) the stars are not strongly clustered. Spawning stars 
from it makes the situation even more worse (top right panel). 
The problem stems from the limited numerical resolution of 
BJ05 simulations, specially a few massive ones which domi- 
nate most of the mass in the stellar halo. Within 4 kpc from the 
Sun, typically a satellite contributes about 200 particles, and 
the range of scales covered in velocity space is of the order of 
±200 km s~^. Now, considering the fact that we smooth over 
64 particles in phase space, it is easy to see that large amount 
of scatter is introduced when spawning multiple stars from a 
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Fig. 13. — Number density map in E — Lz space of stars lying within 
4 kpc from the sun in one of the BJ05 simulations. Top panels: stars from 
a single satellite. Bottom panels: all stars. In the left panels, stars have the 
same phase space coordinate as that of their parent N-body particles. This 
means multiple stars can have same phase space coordinates. Hence, for the 
aid of visibility, a dispersion of 30 kpc km s~-^ along Lz and 1000 km^ 
along E was added to the position of stars in the left panels. The color table 
represents log(pmax/p), p being the number density. 

single particle. Additionally, Lz being a product of position 
and velocity coordinates, even a small amount of dispersion 
in position and velocity coordinates can cause a large spread 
in the value of Lz. Incidentally, GHBL were able to identify 
significant substructure in their analysis. However, due to the 
above mentioned limitations we cannot do a faithful compar- 
ison with the results of GHBL. 

5.3. Substructures in the stellar halo in (x, z) and (r, Vr) 

space 



According to our discussion in Section 15.21 it is clear that 
when multiple stars are spawned from a single N-body par- 
ticle significant smoothing of structures occurs. Hence, it is 
imperative to check the regime in which the results are reli- 
able. 

In Sharma et al. ( 2010a,b), an application of the code was 
shown for identification of structures in 3d (x, z) space 
making use of the photometric information of stars. Note, 
when kinematics are not needed the sampling resolution can 
be improved by calculating the smoothing lengths in 3d po- 
sition space only. Moreover, a smaller number of smoothing 
neighbors, typically 32, can be used. This improves the spa- 
tial resolution by a factor of 3 on average when compared to 
smoothing lengths in 6d space with 64 neighbors. To check 
this in Figure we plot the surface density maps of stars 
of a simulated halo as sampled by Galaxia. A color limit of 
0.1 < ^ — r < 0.3 (SDSS g and r band) and magnitude limit 
of r < 27.5 was used to sample the main sequence stars, so as 
to fairly sample all the relevant stellar populations in the halo. 
A sub- sampling factor of /sub -sample = 0.25 was used which 
resulted in about 1.5x10^ stars. A slice with \y\ < 10 kpc was 
selected to make the plots. The top left panel shows the map 
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Fig. 14. — Number density map of stars in x — z space of a simulated 
halo sampled by Galaxia at a resolution of 1.5 x 10^ with different sampling 
schemes (NS,AS,SS). The bottom right panel is sampled in 6d while the rest 
of the panels are sampled in 3d. 

as produced when the stars have the same phase space coordi- 
nates as that of the original N-body particle (no scatter scheme 
NS). The top right panel shows the results with the default 
sampling scheme of Galaxia where the stars are scattered only 
when an N-body particle spawns more than one stars (selec- 
tive scatter scheme SS). The bottom left panel show the maps 
when all the spawned stars are scattered (all scatter scheme 
AS). The bottom right panel shows the map when the stars 
are sampled in 6d phase space. At such a high sampling res- 
olution nearly all N-body particles spawn more than one star, 
hence the SS-3d and AS-3d results are identical. Compar- 
ing SS-3d with NS panel it can be seen that all the visible 
structures in NS are adequately reproduced by Galaxia. How- 
ever, extremely low contrast structures in the outskirts in the 
form of shells are smoothed out. The SS-6d scheme also re- 
produces the prominent structures correctly, but in general its 
results are slightly more smoothed when compared to SS-3d, 
which is expected as working in higher dimensions erodes the 
spatial resolution. 

Next, we look at the prospect of analyzing the data when ra- 
dial velocity information is also available. Recently two point 
correlation function in (x, v^) 4d space has bee n used to 
quantify the amount of structure in the stellar halo ( X ue et all 
[20101; LCooper et al. 2010a; Starkenburg et al. 2009). Typi- 
cally when comparing theory with observations the age and 
metallicity distribution of all the N-body particles is assumed 
to be same. In BJ05 simulations each progenitor has its own 
stellar population and with Gala xia it is now possible to draw 
stars using this information. In ISharma et al.l (l2010al) it was 
shown that depending upon the color and magnitude cuts em- 
ployed the sampling probability is different for different ac- 
cretion events. 

In Figure[T5]we show the number density map of stars in the 
r — Vr space (radial distance and radial velocity with respect 
to galactic center). The top, middle and bottom panels show 
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Fig. 15. — Number density map of stars in r — Vr space (radial distance and radial velocity with respect to galactic center) of a simulated halo sampled by 
Galaxia at different resolutions (1.5 x 10^ to 1.5 x 10^) and different scattering schemes (NS,AS,SS). Label NS stands for no scattering scheme, where the stars 
are simply assigned the co-ordinates of its parent N-body particle. Label SS stands for the improved selective scattering scheme, where the stars are scattered 
only if its parent N-body particle generates more than one star. Finally, label AS stands for the scheme where all the spawned stars are scattered. 

have excellent reliability for resolutions N < 10^, and for 
N = 10^, moderate reliability only beyond 50 kpc. In other 
regimes slight smoothing can be expected but the results will 
still be reliable wherever the observational errors are larger 
than the smoothing effects induced by our star spawning pro- 
cess. 

We now discuss the nature of our scattering errors. The un- 
certainties that arise from our N-body sampling scheme are 
very different from normal observational errors. To begin 
with, our sampling scheme is more like an interpolation of 
a function. This means that if the function (density field) is 
smooth the errors are small. Next, if the number of points over 
which the function is known is large (numerical resolution of 
the simulations), the errors will again be small. Finally, the 
amount of scattering in our scheme is inversely proportional 
to the number density of particles, this means that the scatter- 
ing effects are inherently adaptive in space and this is one of 
the strengths of our scheme. For example, the detectability of 
a structure depends upon its density peak being correctly re- 
solved in space and this being a high density region has lower 
amount of scattering associated with it. Incidentally, the adap- 
tive nature of scattering can also be a weakness in some cases. 
For example, if a low particle density region of another sys- 



the results with NS, SS and AS scheme respectively. The sam- 
pling resolution increases from left to right in the plots. A sub- 
sampling factor of /sub-sample = 0.25,0.025,0.0025 and 
0.00025 were used which resulted in sample sizes of 1.5 x 10^, 
1.5 X 10^ ,1.5 X 10^ and 1.5 x 10^ respectively. The fraction 
of stars, that suffer from scattering in the default SS scheme, 
is 3, 25, 70 and 93% for sample sizes ranging from 1.5 x 10^ 
to 1.5 X 10^ (bound systems, which are anyway easily de- 
tectable, were excluded). Hence, for sample sizes below 10^ 
very few stars need to be scattered. This fact is also reflected 
in the maps in Figure[T5l where, at resolutions below 1.5x10^ 
the maps of NS and SS scheme are almost identical. In con- 
trast, the AS scheme significantly erodes the structures. At 
resolution of 10^, the SS scheme is better than AS but when 
compared to the NS scheme, structures below r < 50 kpc are 
eroded. At even higher resolution, the SS scheme is identi- 
cal to AS scheme, which is expected as 93% of the stars are 
scattered. At the highest resolution, most prominent struc- 
tures are correctly reproduced but the contrast of the fine scale 
structures is reduced. Note, these results are without observa- 
tional errors, which when included will anyway erode the fine 
scale, low contrast structures. To summarize, we think the re- 
sults from Galaxia when sampling the stellar halo in 6d space 
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tern contributes significantly to the number density of stars at 
the location of a given structure then it can decrease the con- 
trast of that structure. This is typically the case in our simu- 
lations, where the smooth background component of the halo 
is dominated by a few massive accretion events which are not 
adequately resolved. Hence the weakness is not really a draw- 
back of the scheme but more an artifact of inadequate particle 
resolution of simulations. The question as to what is the ap- 
propriate N-body resolution for simulating a structure so that 
it can be properly sampled by our scheme, is something which 
should be tested in future by numerical convergence studies of 
simulations with different resolutions. 

Finally, we discuss as to which type of accreting systems 
are most likely to suffer from scattering. The BJ05 simula- 
tions sample all satellites with typically same number of stel- 
lar particles (approximately 2 x 10^) irrespective of their lu- 
minosity, this means low luminosity systems are less likely to 
spawn multiple stars and suffer from scattering as compared 
to the high luminosity ones. Moreover, the low luminosity 
systems are less phase mixed and have high particle phase 
space density ( as they have smaller internal velocity disper- 
sions and are less affected by dynamical friction), and hence 
these systems will also have lower amount of scattering. Fi- 
nally, dynamical friction sinks the systems inwards and it be- 
ing stronger for the high luminosity systems, such systems 
will lie preferentially towards the inner regions of the halo 
than the outskirts. This explains the smoothing in the plots 
being more prominent in the inner regions (r < 50 kpc). 

6. DISCUSSION 

Presently, given a model the code generates a particular re- 
alization of it. While this has its own applications, e.g., testing 
instrument capabilities, correcting for systematics and mak- 
ing predictions for observations. But in the end one desires to 
learn more about the formation of our galaxy. And this would 
require refining or fine tuning the model as new data comes in. 
In this respect, our parameterized analytical models based on 
well understood physical principles easily lend themselves to 
model fitting procedures and are an invaluable tool for under- 
standing the formation of our galaxy. Such schemes have been 
successfully used to constrain the star formation rate and the 
age- v elocity relation by using the data in the solar neighbor- 
hood (|AumerSlinne3|200^ A fast 
and efficient method to generate surveys from parameterized 
models as presented here should be immensely useful for per- 
forming such model fitting analysis in future. More gener- 
ally one c ould use the Markov Chain Monte Carlo schemes 
(iLewis & B ridle 2002) to explore the parameter space of the 
analytical models. 

In order to facilitate the exploration of the parameter space 
there are two issues that need to be addressed. First, although 
one can vary the SFR and other model parameters that exist in 
Galaxia, dynamical consistency is not guaranteed. This can be 
done by using the scheme outl ined in lBienayme et al.l (Il987h 
(see also I Just & JahreiBi (120101) ). where they self-consistently 
calculate the local scale height (or alternatively the ellipticity) 
as a function of age. Note, in the above scheme the large scale 
dynamical self consistency is still not guaranteed. Secondly, 
although Galaxia is fast enough to generate multiple realiza- 
tions of a model but each time the model is changed the code 
needs to recompute the nodes which takes some time. This is 
also something that can be improved. 

Chemical evolution is also another thing that is presently 
not self consistent. Recently some promising semi-analytic 



schemes have been proposed in which chemical evolution i s 
done along with radial mixing (ISchonrich & Binneyl 120091) . 
Although presently in Galaxia we do not include models with 
radial mixing but we have shown that in principle it is possible 
to accommodate this within our framework. 

There are also some other improvements that can be 
done. Galaxia has yet to incorporate a detailed central 
bar (presently only a bar shaped bulge is included) and the 
spiral arms because these await better constraints from the 
AAOmega survey of the central disc (K.C. Freeman, per- 
sonal communication). Although presently in Galaxia, the 
thick disc is treated as a population of constant age in the 
manner of the Besangon model. T his is in agreement with 
iFreeman & Bland-HawthornI (|2002|), who consider the thick 
disc to be chemically distinct VBe nsbv et al.ir2003L l2005h and 
uniformly old (Gilmore et al. 1995 ]). However, th ere are al- 
ternative models, e.g., Schonrich & Binney (2009) model the 
thick disc by scattering stars from the inner regions of the disc. 
Here the thick disc is also old but not necessarily of a constant 
age. 

Finally, as an application we plan to use Galaxia to 
address one of the key goals of the HERMES project, 
i.e., recovering star clusters that have long since dispersed 
within the Galaxy. The ability to recover ancient star clus- 
ters for a given set of observational parameters can be 
tested using a mul t i-dim ensional group finding algorithm 
ISharma & JohnstonI (l2009i) . The power of this approach 
was rece ntly demonstrated in the conte xt of simulated dwarf 
galaxies ("Bland-Hawthorn et al.l l20103) . With Galaxia for a 
given observational survey one can estimate the mass in dif- 
ferent galactic components. Subsequently, simple prescrip- 
tions assuming an initial cluster mass function for stars along 
with the assumption of chemical homogeneity with in a clus- 
ter could be employed to simulate structures in the chemical 
abundance space. 

7. CONCLUSIONS 

It is only in recent times that the community has begun to 
undertake massive spectroscopic surveys of 10^ to 10^ stars. 
These in clude the RAVE surv ey on the UK Schmidt 1 . 3mtele- 
scope ( Steinmet z et al.l l2006'). the APOGEE (Maiew skTet al.l 
2010) and SEGUE surveys ( Yannv et al. 2009) on the Sloan 
2.3m telescope and the upcoming HERMES survey on th e 
AAT 3.9m telescope (" Freeman & Bland-HawthornI I2008h . 
Similar surveys are under discussion for the LAMOST and 
the and the CFHT (3.6m) telescopes. These surveys are 
to be complemented by huge astrom etric missions, in par- 
ticular, JASMINE (Yan o et al.|[2008h and GAIA (PerrvmanI 
120021), and all- sky photometric surveys from SkyMapper 
(Keller.etalJ I2007h . PanSTARRS and LSST (llvezic etal.1 
2009). Thus, within a decade, we will have a vast wealth of 
data over a significant fraction of the Galaxy. 

If we are to move beyond information to knowledge, a 
working framework will be essential. This framework needs 
to be sufficiently flexible that it can adapt to new develop- 
ments in the theory of galaxy dynamics, updates in stellar syn- 
thesis libraries, and ever-improving N-body or semi-analytic 
simulations. On the other hand, the framework should also 
be fast enough to generate huge amounts of data and multiple 
realizations in preparation for upcoming surveys. To this end, 
we have presented here a population synthesis code for gen- 
erating a synthetic catalog of stars from a given analytical or 
N body model of a galaxy. A set of theoretical isochrones are 
used to convert the given model into a catalog of stars. 
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Although schemes for converting an analytical model into 
stellar catalogs have been implemented in the past, but they 
give catalogs along specific lines of sight rather than a field 
of finite size and its too cumbersome to build a wide field by 
decomposing it into many line of sights. This makes them 
limited in their use for generating large wide area catalogs. 
To overcome this limitation, we implement a new algorithm 
for sampling the stars that on one hand generates stars that 
are smoothly distribution over the desired space and on the 
other hand, is very efficient at generating wide area surveys 
consisting of a large number of stars. As a concrete example 
we have implemented the Besangon model within Galaxia and 
shown that it can accurately reproduce Besan9on star counts. 
However, the design of Galaxia is flexible enough to simulate 
other alternative models also. Galaxia in general can accept an 
input model in the form of analytic functions for the density 
distribution, SFR , AMR and AVR. 

In addition to sampling an analytical model, an algorithm 
for sampling an N body model is also presented. The 
novel feature of this algorithm is its ability to sample the 
stars in accordance with the underlying phase space den- 
sity of the N-body particles. As an application we make 
use of the simula ted N-b ody models of the stellar halo by 
iBullock & JohnstonI (l2QQ5h which allows one to make de- 
tailed predictions about the structures in the stellar halo. Other 
than this, the N-body sampling scheme will also be useful to 
simulate known structures in the Milky Way, e.g., the streams 
of the Sagittarius dwarf galaxy. Although the N-body sam- 
pling scheme is quite robust but if the N-body resolution of 
the simulations is not adequate then this can lead to spurious 
smoothing of the structures being sampled. 

Using the N-body sampling scheme, we simulated a hypo- 
thetical GAIA catalog of stars within 4 kpc of the sun and 
analyzed the prospect of identifying structures in the energy 
and angular momentum space , as in .Gomez et al.. (^2010) . We 



find that due to limited numerical resolution of the simulated 
stellar halos, our sampling scheme leads to significant wash 
out of structures. However, even if the sampling scheme is 
not used, the distribution of the N-body particles of satellites 
in B 105 simulations is less clustered as compared to results re- 
ported by Gomez et al. (2010). This is something that should 
be investigated in future. 

Although, it is difficult to make very accurate predictions 
in the solar neighborhood, but we show that the B 105 simula- 
tions are suitable for analyzing deep large scale surveys of the 
stellar halo. Our analysis shows that the results from Galaxia 
when sampling the stellar halo in 6d space are very reliable for 
sample sizes of TV < 10^, e.g., a survey of RR Lyrae or BHB 
stars. In other regimes, slight smoothing can be expected but 
the results will still be reliable whenever the observational er- 
rors are larger than the smoothing effects induced by our star 
spawning process. Additionally, when working in 3d position 
space the smoothing effects can be reduced by calculating the 
smoothing lengths in 3d position space only. Our results show 
that in 3d position space most of the structures are accurately 
reproduced irrespective of sample sizes. 

In conclusion, we have presented a detailed framework 
that will allow present and future stellar surveys to be 
compared and evaluated consistently. To facilitate its 
wider use, we plan to release the Galaxia c ode publicly at 
|http : / /galaxia . sourcef orge . net] 

ACKNOWLEDGMENTS 

JBH is funded through a Federation Fellowship from the 
Australian Research Council (ARC). SS is funded through 
ARC DP grant 0988751 which supports the HERMES 
project. JBH also acknowledges the kind hospitality of Mer- 
ton College, Oxford, and a visiting professorship from the 
Leverhulme Foundation during the Hilary and Trinity terms 
2010. 



APPENDIX 

ROTATIONAL KINEMATICS OF THE DISC 



The simplest scheme to describe the azimuthal velocity distribution is to consider it in the form of a Gaussian along with a 
term to correct for the asymmetric drift (see equation |4]). A more thorough treatment is obtained by assuming the [Shul([T969h 
distribution function 



fsUE, L) = 4^^^exp ( t^^^:^ ) (Al) 



where is the energy of a circular orbit with angular momentum L and 

din Rl 

Rl = Rc{L) being the radius of the circular orbit. 
The rms radial dispersion and the surface density is then given by 



7'W = 2/ri + ^|, <A2, 



exp ( '^^L) ' 



X 



^,{L)-^{R,L) 



exp ( ' 2^r° ' ) dL. (A4) 
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(ISchonrich & Binneyl 120091) . Starting with values of (Jr{L) = cf'^{L) and = ^\L), (primed quanti ties being targe t 

distributions supplied a priori) one can solve iteratively for the values of Tj{L) and cfr{L) using the formalism of lDehnenl(| 1 999|) . 
The distribution of L and hence = L/i?, at a given point R is then given by 

^) - 2w^(i?,) I o^Rl) ) ^^'^ 

where ^c{L) = ^{Rl) + L^/2Rl and <&eff(i^, i^) = ^{R) + i^V^i?^ (ISchonrich & Binn eyll2009h. For a given rotation curve 
the potential is computed as ^{R) = — v'^{R)dR/R. For simplicity we assume a flat rotation curve with Vc = 226.84. 

Note, aR is also a function of age. Hence, in practise, we calculate the functions crR{L) and T,{L) for a finite set of ages, and 
then use the simple nearest neighbor interpolation to calculate the P(L, R) for any given age. 
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